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Abstract 



We present a technique for spatiotemporal data analysis called nonlinear Laplacian 

>. 

CO . spectral analysis (NLSA), which generalizes singular spectrum analysis (SSA) to take 



into account the nonlinear manifold structure of complex data sets. The key princi- 



ple underlying NLSA is that the functions used to represent temporal patterns should 

o; 

exhibit a degree of smoothness on the nonlinear data manifold M; a constraint ab- 
sent from classical SSA. NLSA enforces such a notion of smoothness by requiring that 
temporal patterns belong in low-dimensional Hilbert spaces Vi spanned by the lead- 
ing I Laplace-Beltrami eigenfunctions on M. These eigenfunctions can be evaluated 
efficiently in high ambient-space dimensions using sparse graph-theoretic algorithms. 
Moreover, they provide orthonormal bases to expand a family of linear maps, whose 
singular value decomposition leads to sets of spatiotemporal patterns at progressively 
finer resolution on the data manifold. The Riemannian measure of M and an adaptive 
graph kernel width enhances the capability of NLSA to detect important nonlinear 



* dimitris@cims.nyu.edu 



processes, including intermittency and rare events. The minimum dimension of Vi 
required to capture these features while avoiding overfitting is estimated here using 
spectral entropy criteria. 

As an application, we study the upper-ocean temperature in the North Pacific sector 
of a 700-year control run of the CCSM3 climate model. Besides the familiar annual and 
decadal modes, NLSA recovers a family of intermittent processes associated with the 
Kuroshio current and the subtropical and subpolar gyres. These processes carry little 
variance (and are therefore not captured by SSA), yet their dynamical role is expected 
to be significant. 
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1 Introduction 



In recent years, there has been a proliferation o 



: data in geophysics and other applied sciences 



2]. and large-scale numerical models {3]. The 



acquired through observations 1], reanalyses 2 
availability of such data holds promise for advances in a number of important applications, 
such as decadal-range climate forecasts through improved understanding of regime transi- 
tions in the ocean |4J and weather forecasts beyond seven-day lead times through skillful 
models of organized convection in the tropics |5|. However, a major obstacle in using large- 
scale data sets in such applications is their sheer volume and complexity. Typically, the data 
are presented in the form of a high-dimensional time series, acquired through noisy, partial 
observations of a strongly-nonlinear dynamical system. Thus, there exists a strong practical 
interest in developing novel data analysis techniques to decompose the data into a set of 
spatiotemporal patterns revealing the operating nonlinear processes. Such patterns can be 
used to gain scientific understanding of complex phenomena, or to build reduced dynamical 
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6] and neural networks 



7} 



models for prediction. 

Machine learning methods, such as those based on kernels 
are well-suited to capture the nonlinear features of data generated by complex systems, but 
in certain cases are prone to overfitting and/or poor scaling with the dimension of ambient 
space [8]. In contrast, classical linear approaches, such as singular spectrum analysis (SSA) 



and its variants 



12], have the advantages of direct physical interpretability and analysis 



through the tools of linear algebra, but are generally limited in detecting patterns carrying a 
high portion of the variance (energy) of the observed signal. Such patterns are likely to miss 
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important nonlinear dynamical features, including intermittency and rare events 
The latter carry low variance, but can play an important dynamical role by triggering large 
scale regime transitions. 



In 



15| (building on preliminary work in 



16j), a method called nonlinear Laplacian spec- 



tral analysis (NLSA) was developed, whose objective is to address the above shortcomings 
by combining aspects of both nonlinear and linear methods. Similarly to SSA, NLSA de- 
composes an observed signal through spectral analysis of linear operators mapping a space 
of temporal modes (the "chronos" space) to the corresponding space of spatial patterns 
("topos" space) [10]. However, the linear operators used in NLSA, denoted here by A 1 , differ 
crucially from SSA in that they are tailored to the nonlinear manifold structure of the data. 
Specifically, the chronos spaces in NLSA are low-dimensional subspaces of the Hilbert space 
L 2 (M,n) of square-integrable functions on the data manifold M, with fi the Riemannian 
measure induced by the embedding of M in n- dimensional ambient space. NLSA employs 



graph-theoretic algorithms 



17 



18] to produce a set of orthonormal Laplace-Beltrami eigen- 
functions, which form bases for a family of /-dimensional subspaces V\ of L 2 (M, /i), with / 
controlling the scale ( "resolution" ) on the data manifold of the temporal patterns described 
by the corresponding A 1 operator. A decomposition of the data into a set of I spatial and 
temporal patterns then follows from singular value decomposition (SVD) of the n x I matrix 
representing A 1 in the eigenfunction basis. 
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A key difference between NLSA and other nonlinear dimensionality reduction techniques 
in the literature is that the eigenfunctions are not used to define feature maps as done, e.g., 
m kernel PCA flQ. Moreover, the diffusion process on the graph employed to eva 
Laplace-Beltrami eigenfunctions has no implied relation to the actual dynamics (cf. 
Rather, the graph Laplacian eigenfunctions are used in NLSA solely as a set of basis functions 
for Vi. The physical, spatiotemporal processes operating on the data manifold are obtained 
in the linear SVD step. Two further important ingredients of the scheme are: 



uate the 
20, 3)- 



22] to address non-Markovianity of the input data due to 



1. Time-lagged embedding 
partial observations; 

2. Adaptive kernels in the construction of the graph Laplacian (with Gaussian widths 
determined from the distances between temporal nearest neighbors), enhancing the 
capability of the algorithm to capture rare events. 

We demonstrate the efficacy of the scheme in an analysis of the North Pacific sector 
of the Community Climate System model version 3 (CCSM3) {23], augmenting the work 
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16] with new practical criteria for choosing the dimension / of the 



temporal spaces 
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25] . we iden- 



Vi. Using a 700-year equilibrated data set of the upper-300 m ocean 
tify a number of qualitatively-distinct spatiotemporal processes, each with a meaningful 
physical interpretation. These include the seasonal cycle, semiannual variability, as well as 



26] . Besides these 



decadal-scale processes resembling the Pacific decadal oscillation (PDO) 
modes, which are familiar from SSA, the spectrum of NLSA also contains modes with a 
strongly intermittent behavior in the temporal domain, characterized by five-year periods 
of high-amplitude oscillations with annual and semiannual frequencies, separated by periods 
of quiescence. Spatially, these modes describe enhanced eastward transport in the region of 
the Kuroshio current, as well as retrograde (westward) propagating temperature anomalies 
and circulation patterns resembling the subpolar and subtropical gyres. The bursting-like 



behavior of these modes, a hallmark of strongly-nonlinear dynamics, means that they carry 
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little variance of the raw signal (about an order of magnitude less than the seasonal and 
PDO modes), and as a result, they are not captured by linear SSA. 

Here, we pay particular attention to the choice of the dimension of V\. Introducing a 
spectral entropy measure D\ characterizing the change in the energy distribution among 
the modes of A 1 as I grows, we propose to select I as the minimum value beyond which Di 
becomes small. We find this to be a particularly effective way to prevent overfitting the data 
(a common issue in machine learning methods [8]), while capturing the important features 
of the signal through the spatiotemporal modes of A 1 . 

The plan of this paper is as follows. In Sec. [21 we describe our theoretical framework. In 
Sec. El we apply this framework to the upper-ocean temperature in the North Pacific sector 
of CCSM3. We discuss the implications of these results in Sec. HJ and conclude in Sec. 
A Movie showing dynamical evolution of spatiotemporal patterns is provided as Additional 
Supporting Information. 



2 Theoretical framework 

We consider that we have at our disposal samples of a time-series x t of a d- dimensional 
variable sampled uniformly with time step 5t. Here, Xt G W 1 is generated by a dynamical 
system, but observations of x t alone are not sufficient to uniquely determine the state of the 
system in phase space; i.e., our observations are incomplete. For instance, in Section |3] ahead, 
Xt will be a depth-averaged ocean temperature field restricted in the North-Pacific sector of 
CCSM3. Our objective is to produce a decomposition of x t into a set of I spatiotemporal 
patterns, 

i 

fc=i 

taking explicitly into account the fact that the underlying trajectory of the dynamical system 
lies on a nonlinear manifold M in phase space. 
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2.1 Overview of NLSA 

The methodology employed here to address this objective consists of five basic steps: (1) 
Embed the observed data in a vector space H of dimension greater than d via the method 
of delays; (2) construct a linear map A 1 taking a Hilbert space of scalar functions on M 
representing temporal patterns to the spatial patterns in H; (3) perform an SVD in a basis of 
orthonormal Laplacian eigenfunctions to extract the spatial and temporal modes associated 
with A 1 ; (5) project the modes from H to physical space M. d to obtain the spatiotemporal 
patterns x\ in ([1]). Below, we provide a description of each step. Further details of the 



procedure, as well as pseudocode, are presented in 



15] . Hereafter, we shall consider that 



M is compact and smooth, so that a well-defined spectral theory exists 



Even though 



these conditions may not be fulfilled in practice, eventually we will pass to a discrete, graph- 



theoretic description 28[, where smoothness is not an issue. 
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3JJ. Under 



Step (1) is familiar from the qualitative theory of dynamical systems 
generic conditions, the image of Xt in embedding space H = R n under the delayed-coordinate 
mapping, 

X t M- X t = (x t , X t -st, . . . , 2Gi_(,_i) st) (2) 

lies on a manifold which is diffeomorphic to M (i.e., indistinguishable from M from the point 
of view of differential geometry), provided that the dimension n of H is sufficiently large. 
Thus, given a sufficiently- long embedding window At = (<? — 1) St, we obtain a representation 
of the nonlinear manifold underlying our incomplete observations, which can be thought of as 
a curved hypersurface in Euclidean space. That hypersurface inherits a Riemannian metric 
g (i.e., an inner product between tangent vectors on M constructed from the canonical inner 
product of H) and a corresponding Riemannian measure /x = (det g) 1 ^ 2 . 

Steps (2) and (3) effectively constitute a generalization of SSA, adapted to nonlinear data 
sets. First, recall that SSA |lOj views the data matrix 

X = [X , X St , . . . , X( 8 -i)st] (3) 
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(dimensioned n x s for s samples in n-dimensional embedding space) as a linear map from 
the space of temporal patterns W (so-called chronos space) to the space of spatial patterns 
H (the topos space), defined as 

y = xf. (4) 

That is, the spatial pattern y el™ corresponding to the temporal pattern / = (/ 1; . . . , f s ) T G 
M s is given by a weighted sum of the data points Xu_i\ § t by /j. Depending on the application 
at hand, W may be replaced by a more general Hilbert space L 2 (T) over the set of temporal 
observations T = {0, St, . . . , (s — 1) St}. In either case, SSA produces a spatiotemporal 
decomposition of the signal through SVD of the linear map X, viz. 

X = UEV T , (5) 

with 

U = K, . . . ,Un], £ = diag(oi, . . . ,a min{rijS} ), V = [v 1 , . . . , v s ], 

(6) 

m eH, cTi> 0, Vi G L 2 (T). 
This leads to a rank-/ decomposition of the signal through 

l 

X\ = Y< X ti X* = u k a k v k (t), (7) 

k=l 

where v k (t) is the component of v k associated with time t gT. Similarly to (j3j), the spatial 
pattern u k corresponding to v k is given by a weighted sum of the input data, 

a k u k = Xv k . (8) 

Even though the temporal patterns v k are well-behaved functions of time [because they 
are square-integrable functions in L 2 (T)], they exhibit no notion of regularity on the non- 
linear data manifold. That is, if the system trajectory X t happens to pass from the same 
geometrical neighborhood in M at two separated time intervals, t and t', then corresponding 
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temporal patterns v^(t) and V)~(t') may differ by arbitrarily large amounts in SSA; in partic- 
ular, \vk(t) — Vk(t')\/\\Xf — X t r\\ may be unbounded as \\X t — X t /\\ — > 0. In NLSA, geometrical 
regularity is viewed as an essential ingredient of an efficient description of high- dimensional 
complex data, and is enforced by replacing the chronos space of SSA with function spaces on 
the data manifold of sufficient smoothness. More specifically, the temporal modes in NLSA 
have bounded directional derivatives on the data manifold, i.e., '52 a € a d a Vk is bounded for 
all tangent vectors £ a on M [here and in (Q summation over the tensorial indices a and b is 
over the dimensions of M\. 

A natural set of basis functions possessing this type of regularity are the eigenfunctions 
(pi of the Laplace-Beltrami operator A associated with the metric g, defined as 2?J 

A (/) = --EM/^J), (9) 

" a,b 

with J2b9 ab 9bc — 3 a c, and / an element of the Hilbert space L 2 (M, /x) of square-integrable 
scalar functions on M with inner product inherited from the Riemannian measure /i [see (TTTjl 
ahead]. The eigenfunctions of A are solutions of the eigenvalue problem 

A0, = Ai0i (10) 

(together with appropriate boundary conditions if M has boundaries) with = Ao < Ai < 
\i< • • • . Moreover, 0j can be chosen to be orthonormal with respect to the inner product 
of L 2 (M,/i), i.e., 

f »(X)<t> i (X)<l> j {X) = 6 ij . (11) 

J M 

Let <£>i be the /-dimensional subspace of L 2 (M, ji) spanned by the leading / eigenfunctions 
from ffTUl) meeting the ort honor mality condition in fTTTl) ; i.e., 

= span{0 o , • • • , <f>i-i}. (12) 
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These spaces have the following important properties. 

1. As I — >■ oo, <£>i provides a dense coverage of L 2 (M,/i). Moreover, if M is a differen- 
tiable manifold, every basis element of <Pi is a smooth function with bounded covariant 
derivative. Heuristically, I may be though of as a parameter controlling the scale on 
the data manifold resolved by the eigenfunctions spanning <fr\. 

2. For sufficiently small I and large-enough number of samples s, the values of {<fto, . . . , <pi-i} 
on the discrete samples of the data manifold in ([3]) can be computed efficiently using 

H Q, 32|. 



sparse graph-theoretic algorithms 



Even if M is not a smooth manifold 



(as is frequently the case in practice), the leading few eigenfunctions determined by 
graph-theoretic analysis can be associated with some smooth coarse-grained manifold 
reflecting the large-scale nonlinear geometry of M. 

In NLSA, the family of <&i with I between one and some upper limit are employed to 
construct temporal spaces analogous to the chronos space in classical SSA. Specifically, given 
a trajectory X t on the data manifold, a function 

i 

/ = J>&, (!3) 

i=l 

with expansion coefficients c, gives rise to a temporal process f(t) = f(X t ) for t G T. Thus, 
introducing the Hilbert space L 2 (T, n) with weighted inner product 

(/J^^^Mt)/^), (14) 
teT 

the Z-tli temporal space in NLSA is the /-dimensional subspace Vi of L 2 (T, /i) consisting of 
temporal patterns / generated by functions of the form in (1131) . 

Similarly to (jlj), for each V\ there exists a linear map A : Vi i— > H linking the temporal 
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modes to the spatial modes through the formula 



y = A\f) = J2KXt)X t f(t). (15) 



teT 



Let {0o> • • • be the orthonormal basis of <&i from (jTUI) . and {e\, . . . , e n } an orthonormal 

basis of H. The matrix elements of A 1 with this choice of bases are 



4 = (Afo), ei) = j dti(X t ) HX t )Xl (16) 

where (•, •) is the inner product of H, and X\ = (X t , e^. Computing the SVD of the n x I 
matrix [A\j\ then leads to a spectral decomposition of A 1 analogous to (jSJ), viz. 

r 

A ij = u i* a k v ik, r = min{Z, n}, (17) 
fc=i 

where and f jk are elements of n x n and / x I orthogonal matrices, respectively, and 
a l k > are singular values (ordered in order of decreasing a l k ). Each Uk = (wife, . . -u n k) is a 
spatial pattern in if expanded in the {ei} basis. Moreover, the entries (v\k, ■ ■ ■ v Vl k) are the 
expansion coefficients of the corresponding temporal pattern in the {0j} basis for Vj, leading 
to the temporal process 

i 

v k (t) = J2 v ik&-i(X t ) (18) 
i=i 

Thus, the decomposition of the signal X t in terms of the I chronos and topos modes of A 1 , 
analogous to (J7J), is 

i 

Xl = J2*?, with X* = u k a l k v k (t). (19) 

k=l 

Note that by completeness of Laplace-Beltrami eigenfunctions EfcLi 4>k{X)4>k{X') = S(X — 
X')], X\ converges to X t as I — > oo. Moreover, the spatial and temporal covariance operators 
in NLSA, A l A l * and A l *A l , can be expressed as convolutions of the spatial and temporal 
two-point correlation functions with the spectral kernel, K,i(X,X') = Yl k =i ( l > k{X)(f) k (X'), 
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weighted by the Riemannian measure; see 15( for details. 

To complete the procedure, in step (5) the X t fc are projected to c?-dimensional physical 
space by writing 

X\ = (x t ,o, xt,st, • • • , xt,( q -i)s t ), (20) 

and taking the average, 



( 21 ) 



t',T:t'-T=t 

This leads to the decomposition in ([T]). 
2.2 Graph-theoretic analysis 

In applications, the Laplace-Beltrami eigenfunctions for a finite data set are computed by 
replacing the manifold by a weighted graph G with vertices {X$ t , X 2 st, ■ ■ ■ , X s st} C M, and 
solving the eigenproblem of a transition probability matrix P defined on the vertex set of G, 
such that, for large-enough s and small-enough i, the right eigenvectors (f). of P approximate 
the corresponding Laplace-Beltrami eigenfunctions fa in fllOj) : i.e., 



;i - Ai)0, (22) 



with 0. = (0j!, . . . , is ) T and 0^ ~ (f>i(X^-i)st)- These eigenvectors satisfy an orthonormality 
condition which is the discrete analog to (jlQj) . 



y^^k(pik(pjk = Sjj, (23) 
fc=l 

with fik given by the invariant measure (leading left eigenvector) of P; namely, 

n = flp, (24) 

where Jt = . . . , /j, s ), ^ > 0, and £^ =1 ^ = 1. 
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In the present work, we evaluate P using the diffusion map (DM) algorithm of Coifman 



and Lafon 



181 ]. with a simple but important modification in the calculation of the Gaussian 



weight matrix. Specifically, we assign to each sample X^ t a local velocity in embedding 
space, £j = \\Xi$ t — X^_i)st\\, and evaluate the Gaussian weights Wij = exp(— \\Xist — 
Xjst\\ 2 /e(£i£j) 1/2 ), where ||-|| denotes the norm of H. This approach was motivated by 
the clustering algorithm developed in 33(, with the difference that in the latter paper 
is evaluated using spatial nearest neighbors, rather than the temporal nearest neighbors 
employed here. In the standard implementation of DM, e must be sufficiently small in 
order for the diffusion process represented by P to be sensitive only to local neighborhoods 
around each data point. Here, the normalization by & enforces geometrical localization even 



for e = 0(1). In 



151 ]. we found that this type of adaptive kernel significantly enhances the 



capability of NLSA to capture rare events. The remaining standard steps needed to compute 



P given W are 



lfil 



3=1 

Kn = Wij/iQiQj), 

s 

i=i 

Pi I Qi- 



(25a) 

(25b) 
(25c) 

(25d) 



The scalability of this class of algorithms to large problem sizes has been widely demon- 
strated in the machine learning and data mining literature. In particular, as a result of 
Gaussian decay of the weights, the W matrix used in implementations is made sparse, e.g., 
by truncating W to the largest b nonzero elements per row with 6/s < 1, significantly re- 
ducing the cost of the eigenvalue problem for <$.. The least-favorable scaling involves the 
pairwise distance calculation between the data samples in embedding space, which scales 
like s 2 dim(_ff) if done in brute force. Despite the quadratic scaling with s, the linear scal- 
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ing with dim(H) is of key importance, as it guarantees that NLSA does not suffer from a 



'curse of dimension" as do neural-network-based methods 



7| . Moreover, an s log s scaling 



may be realized in the pairwise-distance calculation if the dimension of H is small-enough 
for approximate fc<i-tree-based algorithms to operate efficiently 3^. In the present study, 
all eigenfunction calculations were performed on a desktop workstation using brute-force 
evaluation of pairwise distances. 



2.3 Selecting the dimension of V\ via spectral entropy 

An important question concerning parameter selection in NLSA is how to choose the pa- 
rameter I controlling the dimension of the temporal spaces VJ. Clearly, working at large I is 
desirable, since the lengthscale on the data manifold resolved by the eigenfunctions spanning 
Vi generally becomes smaller as I grows. However, for a given finite number of samples s, the 
approximation error in the eigenvectors of the graph Laplacian in (|22|) also in increases with 



I [18[. In other words, the eigenfunctions <j>i determined through graph-theoretic analysis 
will generally depend more strongly on s for large i, resulting in an overfit of the discrete 
data manifold. Thus, in practical applications it is important to establish criteria that allow 
one to determine a reasonable tradeoff between improved resolution and risk of overfitting. 

One way of addressing this issue is to monitor the growth of an operator norm for A\ 
with /, seeking plateau behavior or L-shaped curves. A standard choice of operator norm 
in this context is the Frobenius norm, which may be evaluated conveniently via the matrix 
elements A\- in ffl6l) . viz. 

n I 

PT = ££(4) 2 - ( 26 ) 

i=l j=0 

However, as we will see below, this approach may lead to considerable uncertainty in the 
choice of I. Instead, we find that a more effective method is to choose I by monitoring changes 
in the distribution of the singular values a\ with I. In particular, we propose to assess the 
behavior of the spectrum of A 1 with / via a spectral entropy measure, as follows. 
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Let 



p \ = pi (27) 



be normalized weights measuring the distribution of energy among the spatiotemporal modes 
of A 1 , which, as usual, are ordered in order of decreasing a\. Consider also the energy- 
distribution 7r' +1 over I + 1 modes determined by replicating a\ for i E [1,1], and setting the 
energy in 7r]tJ equal to a 12 . That is, we have 



a 2 a l i < ^ 

nl +1 = —p— 2 , with a l= { (21 



Here, we measure the change in the spectrum of A 1 relative to A l+1 through the relative 
entropy between the energy distributions and 7r' +1 , i.e., 

l i+i 

A = £>* +1 log%. (29) 

7T- 

i=l * 



It is a standard result in information theory and statistics [35] that Di is a non-negative 
quantity which vanishes if and only if = 7r. +1 for all i. In Sec. 14.21 we demonstrate 
that as I grows Di exhibits a sequence of spikes at small to moderate I (as qualitatively new 
features appear in the spectrum of A 1 ), until it eventually settles to small values at higher 
The practical criterion proposed here is to set I to values near that transition. 
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3 Spatiotemporal patterns in the North Pacific sector 
of a comprehensive climate model 



3.1 Data set description 

We apply the NLSA scheme presented above to study variability in the North Pacific sector 



of CCSM3; specifically, variability of the mean upper 300 m sea 
yr equilibrated control integration used by Teng and Branstator 



temperature field in the 700 
4j and Branstator and Teng 



25j in work on the initial and boundary-value predictability of subsurface temperature in 
that model. Here, our objective is to diagnose the prominent modes of variability in a time 
series generated by a coupled general circulation model. In this analysis, the x t observable 
is the mean upper 300 m temperature field sampled every month at d — 534 gridpoints 
(native ocean grid mapped to the model's T42 atmosphere) in the region 20°N-65°N and 
120°E-110°W. 



3.2 Spatiotemporal patterns revealed by NLSA 

Deferring a discussion on parameter selection to Sec. HI we begin with a description of the 
spatiotemporal patterns determined via NLSA using a two-year lagged-embedding window 
and the leading 27 Laplace-Beltrami eigenf unctions as basis functions for V\. Thus, the 
dimension of the spatial and temporal spaces is n = d x 24 = 12,816 and dim(VJ) = I = 27, 
respectively. Throughout, we work with canonical Euclidean distances between vectors in 
embedding space, 

n^-^'ii 2 = E(^-^) 2 , ( 3 °) 

i=l 

where X\ = (e«, Xt) denotes the i-th gridpoint value of the system state at time t in embed- 
ding space. For the evaluation of the graph-Laplacian eigenf unctions in fT22l) we set e = 2 
(adaptive kernel width) and b = 3500 (number of edges per data point), though our results 
are qualitatively robust with respect to changes in these parameters in the intervals e G [1, 2], 
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and b G [30,4000] (cf. the corresponding parameter values in 

We display the singular values o\ of the resulting A 1 operator in Fig. [TJ Studying the 
corresponding temporal patterns Vk from (Tl8l) in both the temporal and frequency (Fourier) 
domains, we find that the modes fall into three distinct families of periodic, low-frequency, 
and intermittent modes, illustrated in Fig. [21 and described below. The resulting spatiotem- 
poral patterns x\ from ([T]) are shown in Fig. [3] and, more clearly, in the dynamical evolution 
in Movie SI. Note that the time-lagged embedding in (j2J) is essential to the separability of 
the modes into these families; we will return to this point in Sec. 14.31 

Periodic modes. The periodic modes come in doubly-degenerate pairs (see Fig. [[)), 
and have the structure of sinusoidal waves with phase difference 7r/2 and frequency equal 
to integer multiples of 1 yr -1 . The leading periodic modes, v% and t>2, represent the annual 
(seasonal) cycle in the data. In the physical domain [Fig. [3]^c) and Movie Sl(c)], these 
modes generate an annual oscillation of temperature anomalies, whose amplitude is largest 
(~ 1°C) in the western part of the basin (~ 130°E-160°E) and for latitudes in the range 
30°N-45°N. The second set of periodic modes, V\\ and t>i2, have semiannual variability. 
These modes exhibit significant amplitude in the western part of the domain [Fig. Ete) 
and Movie Sl(e)], but also along the West Coast of North America, which is consistent 
with semiannual variability of the upper ocean associated with the California current \t 



Together with the higher-frequency overtones, the modes in this family are the standard 
eigenfunctions of the Laplacian on the circle, suggesting that the data manifold M has the 
geometry of a circle along one of its dimensions. 

Low-frequency modes. The low-frequency modes are characterized by high spectral 
power over interannual to interdecadal timescales, and strongly suppressed power over annual 
or shorter time scales. As a result, these modes represent the low-frequency variability of the 



upper ocean, which has been well-studied in the North Pacific sector of CCSM3 [4J, |24j, 125 ] . 
The leading mode in this family [i^; see Fig. EJ^b)], gives rise to a typical PDO pattern 
[Figure|3]^c) and Movie Sl(c)], where the most prominent basin-scale structure is a horseshoe- 
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like temperature anomaly pattern developing eastward along the Kuroshio current, together 
with an anomaly of the opposite sign along the west coast of North America. The higher 
modes in this family gradually develop smaller spatial features and spectral content over 
shorter time scales than V3, but have no spectral peaks on annual or shorter timescales. 

Intermittent modes. As illustrated in Fig. [3^f) and Movie Sl(f), the key feature of 
modes in this family is temporal intermittency, arising out of oscillations at annual or higher 
frequency, which are modulated by relatively sharp envelopes with a temporal extent in 
the 2-10-year regime. Like their periodic counterparts, the intermittent modes form nearly 
degenerate pairs (see Fig. [1]), and their base frequency of oscillation is an integer multiple 
of 1 year^ 1 . The resulting Fourier spectrum is dominated by a peak centered at at the base 
frequency, exhibiting some skewness towards lower frequencies. 

In the physical domain, these modes describe processes with relatively fine spatial struc- 
ture, which are activated during the intermittent bursts, and become quiescent when the 
amplitude of the envelopes is small. The most physically-recognizable aspect of these pro- 
cesses is enhanced transport along the Kuroshio current region, shown for the leading-two 
intermittent modes (vu and fis) in Figure [3](d). This process features sustained eastward 
propagation of small-scale, ~ 0.2 °C temperature anomalies during the intermittent bursts. 
The intermittent modes higher in the spectrum also encode rich spatiotemporal patterns, 
including retrograde (westward) propagating anomalies, and gyre-like patterns resembling 
the subpolar and subtropical gyres. These features are shown in Movie Sl(f), which displays 
a composite temperature anomaly field consisting of the leading four intermittent modes 
(v u , . . .,v 17 ; see Fig. [p. 
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4 Discussion 



4.1 Intermittent processes and relation to SSA 

The main result of this analysis, which highlights the importance of taking explicitly into 
account the nonlinear structure of complex high-dimensional data sets, is the existence of 
intermittent patterns of variability in the North Pacific sector of CCSM3, which are not 
accessible through SSA. This type of variability naturally emerges by restricting the temporal 
modes to lie in the low-dimensional subspaces V\ spanned by the leading Laplace-Beltrami 
eigenfunctions on the data manifold M. The inner product of these vector spaces, weighted 
by the Riemannian measure \x in (j!4p . plays an important role in the skill of NLSA of 



capturing intermittency and rare events [15] . Heuristically, this is because n(Xt) weighs 
each state X t by the volume it occupies in M, which is expected to be large for rare and/or 
intermittent states. 

As shown in Figs. [I] and [31 the spatiotemporal patterns determined through NLSA are in 
close agreement with SSA for the annual and low-frequency modes, but intermittent modes 
have no SSA counterparts. In particular, instead of the qualitatively- distinct families of 
processes described above, the SSA spectrum is characterized by a smooth decay involving 
modes of progressively higher spatiotemporal frequencies, but with no intermittent behavior 
analogous, e.g., to mode vu in Fig. [2j The a\ values associated with the intermittent modes 
and, correspondingly, their contributed variance of temperature anomaly, is significantly 
smaller than the periodic or low-frequency modes. However, this is not to say the dynamical 
significance of these modes is negligible. In fact, intermittent events, carrying low variance, 
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15] . Being able to capture 



are widely prevalent features of complex dynamical systems 
this intrinsically nonlinear behavior constitutes one of the major strengths of the NLSA 
methodology presented here. 
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4.2 Selecting the temporal-space dimension through spectral en- 
tropy 

As discussed in Sec. 12. 3[ an important issue in NLSA is the selection of the dimension of 
the temporal space V\ through the number / of Laplace-Beltrami eigenf unctions used in ( 1121) . 
Setting I too low will cause some important features to be lost, due to under- resolution on 
the data manifold. On the other hand, as / grows, eventually the algorithm will overfit the 
data if the number of samples s remains fixed. Here, we illustrate how the spectral entropy 
measure A introduced in (1291) can be used to inform the selection of suitable values of I. 

Figure H] shows the dependence of Di, as well as the Frobenius norm \\A \\ of the linear 
operators in NLSA, for values of I in the interval [1, 50] and embedding windows At = 0, 2, 
and 4 yr. As expected, ||v4'|| increases with Vi, and apart from the case with At = 0, rapidly 
approaches values close to 90% of its maximum value (occurring for I = s). Compared with 
the corresponding behavior of the norm in truncated SSA expansions (also shown in Fig. H]), 
||v4'|| follows a more staircase-like growth, but because the operator norm is dominated by 
the leading few singular values carrying the majority of the energy of the signal, it is not 
immediately obvious when I has reached an optimum value. On the other hand, the spectral 
entropy measure Di clearly transitions from a regime of saw-tooth behavior with appreciable 
magnitude at small to moderate I, to a regime of negligible amplitude at larger values of 
That transition, which occurs around I = 25 for the NLSA cases with At = 2 and 4 yr (I ~ 7 
for the case with no embedding and for SSA), indicates that increasing the dimension of Vi 
beyond those values introduces no qualitatively new spatiotemporal patterns. This provides 
justification for the I = 27 value used in Sec. El As illustrated in Fig. El the spectral gaps 
separating the low-frequency, semiannual, and intermittent modes are absent from the <j\ 
spectrum with / significantly exceeding the threshold value identified via D\. 
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4.3 The role of lagged embedding 

The embedding in ([2]) of the input data x t in H is essential to the separability of the Laplacian 
eigenfunctions into distinct families of processes. To illustrate this, in FigureOwe display the 
Laplacian eigenfunction that most-closely resembles the PDO mode of Fig. [3]^d), evaluated 
without embedding (q — 1, At — 0). It is evident from both the temporal and Fourier 
representations of that eigenfunction that the decadal process recovered in Sec. 13.21 using 
a two-year embedding window has been contaminated with high-frequency variability; in 
particular, prominent spectral lines at integer multiples of 1 yr -1 down to the maximum 
frequency of 6/yr allowed by the monthly sampling of the data. An even stronger frequency 
mixing was found to take place in the corresponding temporal SSA modes. In general, 
representing the dynamical information lost through partial observations via time-la gge d 
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30 



37|, 



embedding, as advocated in the qualitative theory of dynamical systems 
significantly enhances the quality of time-series reconstructions through either of the NLSA or 
SSA schemes. In separate calculations, we have verified that the eigenfunctions separate into 
periodic, low-frequency, and intermittent processes for embedding windows up to At = 10 
yr, including the At = 4 yr case displayed in Fig. HI 



5 Conclusions 

Combining techniques from machine learning and the qualitative theory of dynamical sys- 
tems, in this work we have presented a scheme called NLSA for spatiotemporal analysis 
of high-dimensional time series, which takes explicitly into account the nonlinear geomet- 
rical structure of data sets arising in geophysics and other applied sciences. Like classical 
SSA ljj, the method presented here utilizes time-lagged embedding and SVD to produce 
a decomposition of time series generated by partial observations of high- dimensional, com- 
plex dynamical systems into distinct spatiotemporal modes. However, the linear operators 
used here in the SVD step differs crucially from SSA in that their domains of definition are 
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low- dimensional Hilbert spaces of square-integrable functions on the nonlinear manifold M 
comprised by the data (in a suitable coarse-grained representation via a graph) . This family 
of spaces, Vi, is tailored to the nonlinear geometry of M through its Riemannian measure, 
and has high skill in capturing intermittency and rare events. As its dimension I grows, 
Vi provides a description of spatiotemporal patterns at increasingly fine resolution on the 
data manifold. Moreover, well-behaved orthonormal basis functions for these spaces can be 
computed efficiently via graph-Laplacian algorithms developed in data mining and machine 
learning; 

HQ. 

Applying this scheme to the upper-ocean temperature in the North Pacific sector of the 
CCSM3 model, we find a family of intermittent processes which are not captured by SSA. 
These processes describe eastward-propagating, small-scale temperature anomalies in the 
Kuroshio current region, as well as retrograde-propagating structures at high latitudes and 
in the subtropics (see Movie SI). Moreover, they carry little variance of the raw signal, and 
display burst-like behavior characteristic of strongly nonlinear dynamics. The remaining 
identified modes include the familiar PDO pattern of low-frequency variability, as well as 
annual and semiannual periodic processes. 

The nature of the analysis presented here is purely diagnostic. In particular, we have not 
touched upon the dynamical role of these modes in reproducing the observed dynamics, e.g., 
by triggering large-scale regime transitions [jjl, [^J. This question was addressed to some 



extent in 



15] in the context of a low-order chaotic model for the atmosphere, but remains 



open in applications involving high-dimensional complex dynamical systems with unknown 
equations of motion. Here, statistical modeling techniques, such as Bayesian hierarchical 



modeling 



model error 



3£ 1, combined with information-theoretic methods for assessing predictability and 
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40], are promising methods for training empirical models for these processes, 



and assessing their predictive skill. We plan to study these topics in future work. 
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Movie SI. Spatiotemporal patterns of the the upper 300 m temperature anomaly field 
(annual mean subtracted at each gridpoint) in the North Pacific sector of CCSM 3, evaluated 
using NLSA with I = 27 (see Fig. [TJ) and SSA. (a) Raw data, (b) Leading low-frequency 
mode from SSA. (c-f) Composite fields determined through NLSA. (c) Annual modes, v\ 
and V2- (d) Leading low-frequency mode, V3, describing the PDO. (e) Semiannual modes, 
vn and Vi2- (f) Leading four intermittent modes, ...Vn, describing variability of the 
Kuroshio current and retrograde (westward) propagating structures. The starting time of 
this animation is the same as in Fig. |2j 
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Fig 1. Singular values a\ (normalized so that a\ = 1) for the periodic, low-frequency (decadal), and 
intermittent spatiotemporal patterns evaluated through NLSA for / = 27 and embedding window 
At = 2 yr. Also shown are the corresponding singular values from SSA. 
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Fig 2. Temporal patterns Vk(t) corresponding to the singular values in Fig. [TJ Shown in the 
temporal (left-hand panels) and frequency domains (right-hand panels) are (a) the annual mode, 
vi, (b) the PDO mode, V3, (c) the leading Kuroshio mode, V14. 
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Fig 3. Spatiotemporal patterns of the the upper 300 m temperature anomaly field (annual mean 
subtracted at each gridpoint) in the North Pacific sector of CCSM 3, evaluated using NLSA with 
I = 27 (see Fig. [[]) and SSA. (a) Raw data in November of year 91 of Figure [2j (b) Leading low- 
frequency mode from SSA. (c-f) Composite fields determined through NLSA. (c) Annual modes, 
vi and V2- (d) Leading low-frequency mode, V3, describing the PDO. (e) Semiannual modes, v\\ 
and v\2- (f) First two- fold degenerate set of intermittent modes, vn and V15, describing variability 
of the Kuroshio current. The dynamical evolution of these patterns in Movie SI is much more 
revealing. 
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Fig 4. Frobenius norm of the A 1 operator and spectral entropy Di, evaluated via ()26|) and (j29p . 
versus the parameter I controlling the dimension of the temporal spaces V[ (dim(V/) = /). The 
lagged-embedding window is At = 0, 2, and 4 yr. The norm ||j4'|| has been normalized to unity at 
/ = s (number of samples). Also shown for reference are the corresponding Frobenius norms and 
spectral entropy measures evaluated by truncating the SSA expansion in ([7]) at I modes. Note that 
the Frobenius norms for NLSA and SSA cannot be compared directly because they are defined for 
linear maps acting on different vector spaces. 
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Fig 6. Leading "low-frequency" mode evaluated without embedding [cf. Fig. (2b)]. Note the 
pronounced spectral lines with period {1, 1/2, 1/3, . . . ,1/6} yr. 
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